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Abstract 

We consider transmitting a source across a pair of independent, non-ergodic channels with random 
states (e.g., slow fading channels) so as to minimize the average distortion. The general problem is 
unsolved. Hence, we focus on comparing two commonly used source and channel encoding systems 
which correspond to exploiting diversity either at the physical layer through parallel channel coding or 
at the application layer through multiple description source coding. 

For on-off channel models, source coding diversity offers better performance. For channels with 
a continuous range of reception quality, we show the reverse is true. Specifically, we introduce a new 
figure of merit called the distortion exponent which measures how fast the average distortion decays with 
SNR. For continuous-state models such as additive white Gaussian noise channels with multiplicative 
Rayleigh fading, optimal channel coding diversity at the physical layer is more efficient than source 
coding diversity at the application layer in that the former achieves a better distortion exponent. 

Finally, we consider a third decoding architecture: multiple description encoding with a joint source- 
channel decoding. We show that this architecture achieves the same distortion exponent as systems with 
optimal channel coding diversity for continuous-state channels, and maintains the the advantages of mul- 
tiple description systems for on-off channels. Thus, the multiple description system with joint decoding 
achieves the best performance, from among the three architectures considered, on both continuous-state 
and on-off channels. 
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I. Introduction 

Consider transmitting a source such as audio, video, or speech over a wireless link. Due to the nature 
of wireless channels, effects such as fading, shadowing, interference from other transmitters, and network 
congestion can cause the channel quality to fluctuate during transmission. When the channel varies on 
a time-scale longer than the delay constraints of the desired application, such channel fluctuations cause 
outages. Specifically, when the channel quality is too low, the receiver will be unable to decode the 
transmitted data in time to reconstruct it at the appropriate point in the source stream. Thus some frames 
of video or segments of speech/audio will be reconstructed at the receiver with large distortions. 

As illustrated in Fig. ^ one approach to combat such channel fluctuations is to code over multiple 
parallel channels {e.g., different frequency bands, antennas, or time slots) and leverage diversity in 
the channel. A variety of source and channel coding schemes can applied to this scenario, including 
progressive and multiple description source codes [l]-[30], broadcast channel codes [31]-[36], and hybrid 
analog-digital codes [37, Chapter 3] [38]-[41]; however, the best source and channel coding architecture to 
exploit such parallel channels is still unknown. In this paper, we examine system architectures based upon 
two encoding algorithms that exploit diversity in the source coding and channel coding, respectively, along 
with two compatible decoding algorithm for the first encoder, and one compatible decoding algorithms 
for the second encoder. We compare performance of these systems by studying their average distortion 
performance on a various block fading channel models. 




Fig. 1 . Conceptual illustration of the parallel diversity coding problem considered in this paper. An encoder must map a source 
sequence, s, into a pair of channel inputs xi and X2 without knowing the channel states si and a^. A decoder must map the 
channel outputs yi and y2 along with knowledge of the channel states into an estimate of the source, s. The optimal encoding 
and decoding architecture is unknown. 

More specifically. Fig. |2 illustrates the two classes of encoders we consider. In the channel coding 
diversity system of Fig.|2la), the source s is encoded into s by a single description (SD) source coder. Next 
s is jointly encoded into (xi,X2) by the channel coder and transmitted across a parallel channel. For the 
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Fig. 2. Block diagrams for (a) channel coding diversity and (b) source coding diversity. 



source coding diversity system of Fig.Elb), the source s is encoded into Si and $2 by a multiple description 
(MD) source coder. Each Sj is then separately encoded into Xj by a channel coder and transmitted across 
the appropriate channel. 

Since the encoders in Fig.|2exploit the inherent diversity of a parallel channel in qualitatively different 
ways, we focus on the following two questions: 

1) Which of the basic architectures in Fig. |2l achieves the smallest average distortion? If neither 
architecture is universally best, for what channels is one architecture better than the other? 

2) Is there a way to combine the best features of both systems in Fig. 

Essentially, the answers we develop can be illustrated through Fig. |3l For channel coding diversity, the 
source codeword, s, can be reliably decoded only if the total channel quality is high enough to support 
the transmission rate. So this system achieves diversity in the sense that even if one of the channels is bad, 
then as long as the overaU channel quality is good, the receiver will still be able to recover the encoded 
source. In contrast, for source coding diversity, each source codeword Sj can be decoded if the quality 
of the corresponding individual channel is high enough. This system achieves diversity in the sense that 
even if one of the channels is bad and one description is unrecoverable, then as long as the other channel 
is good and the remaining description is recovered, a low fidelity source reconstruction is obtained. If 
both channels are good and both descriptions are successfully decoded, then they are combined to form 
a high fidelity reconstruction. 

Fig. |3] compares the two systems when the source coders are designed to achieve the same distortion 
if all source codewords are successfully decoded {i.e., in region III). Furthermore, in region I, both 
systems fail to decode and again have the same distortion. In regions II and V, channel coding diversity 
is superior since the channel conditions are such that at most one source codeword is decoded under 
source coding diversity. Conversely, in region IV, source coding diversity is superior since one source 
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Quality of Channel 1 

Fig. 3. Conceptual illustration of successful decoding regions for source and channel coding diversity systems designed to 
have the same distortion when all codewords are received. For channel coding diversity, the receiver will be able to decode the 
transmitted source description if the sum of the channel quality exceeds a threshold represented by the solid diagonal line. For 
source coding diversity, the first (respectively, second) source description will be successfully decoded provided the first (resp., 
second) channel quality exceeds the vertical (resp., horizontal) dashed line. The ♦'s represent the four possible channel qualities 
for a packet loss channel where each channel is either on or off. 

codeword is received, and channel coding diversity fails to decode. Therefore our first question about 
which of the architectures in Fig. |2l is best, is essentially a question about which region the channel 
quality is most likely to lie in. If regions II and V are more probable, channel coding diversity will be 
superior; conversely, if regions IV are more hkely, source coding diversity will be superior. 

As a specific example, in the classic MD coding problem modeling link failure or packet erasure [28], 
each channel is either off, in which case no information can be communicated, or supports a particular 
rate. The four channel conditions for this scenario are indicated by ♦'s in Fig. |5]for an example packet 
erasure channel. For such discrete models, source coding diversity is clearly superior, since both SD 
and MD source coding achieve the same distortions in regions I and III, but channel coding diversity 



February 1, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. ON INFORM. THEORY 



5 



fails completely in region IV. In this region, source coding diversity recovers one source codeword and 
produces a low fidelity reconstruction of the source. 

The opposite occurs for channels where a continuous range of rates can potentially be supported (e.g., 
additive white Gaussian noise channels with Rayleigh fading). For these channels, the channel quality 
is essentially more likely to lie in region II than in IV and thus channel coding diversity is superior. 
Specifically, we characterize performance by analyzing how quickly the average distortion decays as a 
function of the signal-to-noise ratio (SNR) for various systems. We refer to the slope of the distortion 
versus SNR on a log-log plot as the "distortion exponent" and use this as our figure of merit. In particular, 
our analysis shows that optimal channel coding diversity is generally superior to source coding diversity 
on continuous channels in the sense that an optimal channel coding diversity architecture achieves a 
better distortion exponent than a source coding diversity architecture. 

Since source coding diversity is best for on-off channels, and optimal channel coding diversity is best 
for continuous state channels, our second question of whether there exists an architecture that combines 
the advantages of both becomes relevant. In addition to our analysis of the two previously known diversity 
architectures in Fig. |2l our second main contribution is the description of a new joint source-channel 
decoding architecture which achieves the best qualities of both. Specifically, to perform well on both 
continuous state channels and on-off channels we do not propose a third encoding architecture, but a 
third new joint decoding architecture. We show that the main inefficiency of source coding diversity 
on continuous state channels results from the channel decoders ignoring the correlation between the 
multiple descriptions. By explicitly accounting for the structure of the source encoding when performing 
channel decoding, we prove a coding theorem characterizing the performance of source coding diversity 
with joint decoding. We show that such a system can achieve the same performance as optimal channel 
coding diversity on continuous channels and the same performance as source coding diversity for on-off 
channels. 

A. Related Research 

The problem of MD coding was initially studied from a rate-distortion perspective, having been 
formalized by Gersho, Witsenhausen, Wolf, Wyner, Ziv, and Ozarow at the 1979 IEEE Information 
Theory Workshop. Their initial contributions to the problem appear in [29], [42]-[44]. El Gamal & 
Cover develop an achievable rate region for two descriptions in [28], and this region is shown to be 
optimal for the Gaussian source, with mean-square distortion, by Ozarow [44]. Specialized results for the 
binary symmetric source, with Hamming distortion, are developed by Berger & Zhang [24], [26], [45] 
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and Ahlswede [27]. Zamir [23] develops high-rate bounds for memory less sources. Most recently, work 
by Venkatarami et. al [3], [21] provides achievable rate regions for many descriptions that generalize 
the results in [26], [28]. Important special cases of the MD coding problem have also been examined, 
including successive refinement, or layered coding, [1], [46] and certain symmetric cases [2], [20]. 

Some practical approaches to MD coding include MD scalar quantization, dithered MD lattice quan- 
tization, and MD transform coding. Vaishampayan [25] pioneered the former, Frank-Dayan and Zamir 
considered the use of dither [7], and Wang, Orchard, Vaishampayan, and Reibman [22] and later Goyal 
& Kovacevic [16] studied the latter. See [17] for a thorough review of both approaches. Recently, the 
design of MD video coders has received considerable attention [4], [8]-[10], [13], [19] 

All of the classical work on MD coding utilizes an "on-off" model for the channels or networks under 
consideration, without imposing strict delay constraints. More specifically, source codes are designed 
assuming that each description is completely available (error-free) at the receiver, or otherwise completely 
lost. Furthermore, the UkeUhood of these events occurring is independent of the choice of source coding 
rates. Under such conditions, it is not surprising that MD coding outperforms SD coding; however, 
for many practical channel and network environments, these conditions do not hold. For example, in 
delay constrained situations, suitable for real-time or interactive communication, descriptions may have 
to be encoded as multiple packets, each of which might be received or lost individually. Furthermore, 
congestion and outage conditions often depend heavily upon the transmission rate. Thus, it is important 
to consider MD coding over more practical channel models, as well as to fairly compare performance 
with SD coding. 

Some scattered work is appearing in this area. Ephremides et. al [II] examine MD coding over a parallel 
queue channel, compare to SD coding, and show that MD coding offers significant advantages under high 
traffic (congestion) situations. This essentially results because the MD packets are more compact than 
SD packets, and indicates the importance of considering the influence of rate on congestion. Coward et. 
al [6], [15] examine MD coding over several channel models, including memoryless symbol-erasure and 
symbol-error channels, as well as block fading channels. For strict delay constraints, they show that MD 
outperforms SD; for longer delay constraints, allowing for more sophisticated channel coding, they show 
that SD outperforms MD. Thus, the impact of delay constraints are important. This paper examines fading 
conditions similar to those in [6], [15], but considers a wider variety of channel coding and decoding 
options, with an emphasis on architectural considerations as well as performance. 
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B. Outline 

We begin by summarizing our system model in Section |n] Section |ffl] studies on-off channels, Sec- 
tion |W] treats continuous state channels, and Section |V] develops source coding diversity with joint 
decoding. Many of the more detailed proofs are deferred to Appendices. Finally, Section |^ closes the 
paper with some concluding remarks and directions for further research. 

II. System Model 

Fig. ^depicts the general system model we consider in this paper. Our objective is to design and evaluate 
methods for communicating a source signal s with small distortion over certain channels with independent 
parallel components. In particular, focusing on memoryless source models for simplicity of exposition, we 
consider non-ergodic channels models in which delay constraints or limited channel variations limit the 
effective blocklength at the encoder. Of many possible examples, we focus on on-off channels and additive 
noise channels with block fading. While cross-layer design is generally acknowledged to yield superior 
performance to layered design, simultaneously optimizing all facets of a system is usually too complex. 
Hence we consider various architectures based upon using a classical system at one layer combined with 
an optimized system at another layer. In the remainder of this section, after briefly introducing some 
notation, we summarize the source and channel models, discuss architectural options for encoding and 
decoding, and review high-resolutions approximations for the various source coding algorithms employed 
throughout the paper. 

A. Notation 

Vectors and sequences are denoted in bold {e.g., x) with the ith element denoted as x[i]. Random 
variables are denoted using the sans serif font {e.g., x) while random vectors and sequences are denoted 
with bold sans serif {e.g., x). We denote mutual information, differential entropy, and expectation as 
/(x;y), /i(x), E[x]. Calligraphic letters denote sets {e.g., s S S). When its argument is a set or alphabet, 
I • I denotes the cardinality of the argument. To simplify the discussion of architectures, we use the symbols 
ENC(-) and DEC(-) to denote a generic encoder and decoder. To specialize this generic notation to one 
of the architectures discussed in Section ITl-DI we will employ subscripts representing the relevant system 
variables. 

B. Source Model 

We model the source as a sequence of independent and identically distributed (i.i.d.) samples s [k]. For 
example, such a discrete-time source may be obtained from sampling a continuous-time, appropriately 
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band-limited, white-noise random process. We denote the probabiUty density for the discrete-time source 
sequence s [k] as 

K 

Ps{s) = llPs{s[k]) . (1) 

k=l 

We assume that the process is such that the differential entropy, h{s), and second moment, E [s^], both 
exist and are finite. 

To measure quality of the communication system, we employ a distortion measure between the source 
signal s and its reconstruction s G 5. Specifically, given a per-letter distortion measure d{s [k] , s [k]), we 
extend it additively to blocks of source samples, i.e., 

K 

d{s,s)=Y.d{s[k],s[k]) . (2) 

k=l 

We may characterize performance in terms of various statistics of the distortion, viewed as a random 
variable. In particular, we focus on the expected distortion 

D = E[d{s,s)] . (3) 

Throughout our development, we will emphasize squared-error distortion, for which d{s, s) = (s — s)^; 
in this case, Q is the mean-square distortion. 

C. (Parallel) Channel Model 

The channel depicted by Fig. [^consists of two branches, each of which corresponds to an independent 
channel with independent states. Specifically, a channel input block, x, consists of two sub-blocks, xi 
and X2, and the corresponding channel output block, y, consists of the two sub-blocks, yi and y2. The 
channel states are denoted by random variables ai and 32, respectively. The channel law is the product 
of the two independent sub-channel laws: 

Pyi,y2,ai,a2|xi,X2(yi>y2,«l,«2|xi,X2) = Py,a|x(yi , «1 j^l ) ' Py,a|x(y2 , ^2 |x2) = 

Pa{ai) ■Pa{a2)Y{ [Py|x,a(yiW|2;i[^],ai) • |x,a (2/2 1X2^,02)] • (4) 
j=l 

For simplicity, we only consider channels for which the input distribution that maximizes the mutual 
information is independent of the channel state. Throughout the paper we consider the case where both 
the transmitter and receiver know the channel state distribution and the channel law Py\x, but only the 
receiver knows the realized channel states and channel outputs. 
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To examine fundamental performance and compare between systems, we analyze random coding over 
these non-ergodic channels using outage probability [47] as a performance measure. Briefly, because the 
mutual information /, corresponding to the supportable transmission rate of the channel, is a function of 
the fading coefficients or other channel uncertainty, it too is a random variable. For fixed transmission 
rate R (in nats/channel use), the outage probability Pr [/ < R] measures channel coding robustness to 
uncertainty in the channel.' 

The structure of the channel coding and decoding affects the form of the outage probability expression 
[47]. If coding is performed over only the first component channel, then the probability of decoding 
failure is Pr [/(xi;yi) < i?]. If repetition coding is performed across the parallel channels, then a single 
message is encoded as xi = X2 = x. With selection combining at the receiver, the probability of decoding 
failure is Pr {max[/(x; yi), /(x; y2)] < R}; with optimal maximum-ratio combining at the receiver, the 
probability of decoding failure is Pr {/(x; yi, y2) < R}. Finally, if optimal parallel channel coding is 
performed using a pair of jointly-designed codebooks with xi and X2 independent, the probability of 
decoding failure is Pr [/(xi;yi) + /(x2; y2) < i?]. 

D. Architectural Options 

In this section, we specify some architectural options for encoding and decoding in the source-channel 
diversity system depicted in Fig. ^ 

1 ) Joint Source-Channel Diversity: In the most general setup, joint source-channel diversity consists of 
a pair of mappings (ENCxi,x2^si DECg^y^ yj. The encoder ENCxi,x2«-s maps a sequence of K source 
letters into N pairs of channel inputs; correspondingly, the decoder maps N pairs of channel outputs into 
K reconstruction letters. The ratio N/K (sometimes referred to as the processing gain, excess bandwidth, 
or bandwidth expansion factor) is denoted with the symbol /3 = N/K? Mathematically, 

ENCx„x2-s : 5^ X (5) 

DEQ^y^y, : X 3^2^ 5^ . (6) 

'Mutual information is often used to measure channel robustness when long block lengths are allowed. In [48], however, 
Zheng and Tse show that mutual information (viewed as a random variable), and more specifically outage probability, is a 
relevant quantity for finite block lengths since outage probability dominates error probability. This suggests that outage can be 
a relevant quantity even for very tight delay constraints at high SNR. 

^The bandwidth expansion ratio in [49] (denoted by L) is defined slightly differently from f). Specifically, since [49] considers 
a complex source and Rayleigh fading Gaussian noise channel, L = 2/3. 
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If the image of ENCxi,x2^s> '■^■> ENCxi,x2^s(5^), is finite, we define the rate of the code as 

ln|ENCx„x.^s(>S^)| 
R = ^ , (7) 

which has units of nats per parallel channel use. 

Regarding the non-ergodic nature of the channels, we consider situations in which K is large enough 
to average over source fluctuations, i.e., the source is ergodic, but N is not large enough to average over 
channel variations, i.e., the channel is non-ergodic. 

2 ) Channel Coding Diversity: From one perspective, a natural way to exploit diversity in the channel 
is to employ repetition or more powerful channel codes applied to a single digital representation of 
the source. In such scenarios. Fig. ^ specializes to that shown in Fig. |4] Such channel coding diversity 
consists of a source pair of encoder and decoder mappings (ENCm^s, DECg^m) and a channel pair of 
encoder and decoder mappings (ENCx^m, DECm^y). As in classical rate-distortion source coding, the 
source encoder maps a sequence of K input letters to a finite index, and the source decoder maps an 
index into a sequence of K reconstruction letters: 

ENC„^s:5^^{l,2,...,|7W|} (8) 
DEC3^^:{0,1,2,...,|>I|}^5^ (9) 

Further, as in classical channel coding, the channel encoder maps an index into pairs of channel inputs, 
and the channel decoder maps pairs of channel outputs into an index: 

ENCx^^ ■.{l,2,...,\M\}^X^ X (10) 
DEC^^y:3;f x3^2^^{0,l,...,|A^|} . (11) 

Note that we include the index at the output of the channel decoder and input to the source decoder. 
This serves as a flag in the event of a (detected) channel coding error or outage in which case the source 
decoder reconstructs to the mean of the source. 
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Fig. 5. Source coding diversity system model described more precisely in Section Ill-D.31 



For the channel coding diversity approach, a key parameter is the rate defined by 

where again the units are nats per parallel channel use. 

3 ) Source Coding Diversity: Instead of exploiting diversity through channel coding, an emerging class 
of source coding algorithms based upon MD coding allows diversity to be exploited by the source coding 
layer. 

For such source coding diversity, the block diagram of Fig. Q specializes to that shown in Fig. |5] 
Source coding diversity employs two independent, but otherwise classical, channel encoder and decoder 
pairs (ENCxi^m„DEC^,^yJ and (ENCx,^.n,, DEC^.^yJ: 

ENCx„ : {1,2,...,|A^,|} ^;k;^ (13) 
DEC^^^y^ ij^f'^ {0,1,2,..., IM!} , (14) 

for i = 1,2. Again, we allow for the output of the channel decoding process to be to indicated a 
(detected) error. Here the rates 

R. = '^, . = 1,2, (15) 

both in nats per parallel channel use, are key parameters of the system. 
The source encoder consists of two mappings 

ENC„,^s:5^^{l,2,...,|MI} , i = l,2. (16) 

The source decoder can be viewed as four separate mappings, depending upon whether or not there are 
channel decoding errors on the individual channels. Specifically, the source decoder can be constructed 
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from the following four mappings: 

DECs«^^, : {0} X {0} {5*}^ (17) 

DECg,^^, :{l,2,...,|A^i|}x{0}^5f (18) 

DECs2^;^,2 : {0} X {1, 2, . . . , IX2I} ^ 5f (19) 

DECs,,,^^,2 :{l,2,...,|A^i|}x{l,2,...,|Al2|}^4'^ > (20) 

where is a constant determined by the distortion measure for the source; for example, if mean-square 
distortion is important, then s* = E [s]. Tab. U summarizes how these mappings are employed. 

4) Source Coding Diversity with Joint Decoding: Finally, we also consider source coding diversity 
with joint decoding, as depicted in Fig. |6| Here all is the same as in the source coding diversity model 
of Fig. |5j except that source and channel decoding is performed jointly across channels by accounting 
for correlation among the channel coding inputs mi and m2. Specifically, the channel decoding for this 
approach is a mapping 

DEC^i..-y.,. : K x y2 ^ {0, 1, 2, ... , \Mi\} x {0, 1, 2, ... , \Mi\} (21) 
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which also takes into account knowledge of the source coding structure. In practice full joint-design 
of the decoder may not be required and a partially separated design where likelihood-ratios, quantized 
likelihood-ratios or similar information are exchanged between the source and channel decoders may be 
sufficient. 

E. High-Resolution Approximations for Source Coding 

An important practical example of our source model is the Gaussian source, for which Ps{s) is a 
Gaussian density function with zero mean and unit variance. The Gaussian source also serves as a 
useful approximation to other sources in the high resolution (low distortion) regime [23], [50]. We now 
summarize the well-known results for single- and multiple-description source coding for the Gaussian 
case, and generalize them using the high resolution distortion approximations. These high resolution 
approximations are utilized throughout the sequel in our performance analysis. 

1) Single Description Source Coding: In SD source coding, or classical rate-distortion theory, the 
source, s, is quantized into a single description, s, using rate R. 

In general, the rate-distortion function is difficult to determine, but a number of researchers have 
determined the rate-distortion function in the high resolution limit. Specifically, under some mild technical 
conditions [50], 

1 e^^^^^i 

lim RiD) - - log = . (22) 

This result also implies that^ 

1 e^'*^^) 

Without loss of generality we scale a given source under consideration so that e^^^^^ = 27re to simplify the 
notation. Furthermore, instead of measuring the quantization rate in bits, we will find it more convenient 
to measure the rate in nats per channel sample by using the processing gain /? defined in Section III-D.ll 
Thus we will use the expressions 

R{D) ~ ^ In ;^ and exp R{D) « D^^/^^^) (24) 

to approximate R{D) and exp R{D) in high-resolution. 

^Throughout the paper, the approximation f{x) ~ g(x) is in the sense that f{x)/g{x) 1 and \f{x) — g{x)\ as x 
approaches a limit, either a; or a:: oo, which should be clear from the context. 
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As is well-known, the rate (in nats/channel sample) required for SD source coding of a Gaussian source 
at average distortion D for any resolution is [36] 

Therefore, one way to interpret ( I23l l. is that for difference distortion measures in the high-resolution limit 
all sources essentially look Gaussian except for scaling by the constant factor exp[2/i(s)]/(27re). Note 
that the form of the rate-distortion function in d23l) is asymptotically accurate and not a worst case result 
like those in [51], [52]. 

2) Multiple Description Source Coding: In contrast to SD coding, MD source coding quantizes the 
source into two descriptions, Si and §2 so that if only one is received then moderate distortion is incurred, 
and if both descriptions are received then lower distortion is obtained [28]. 

The rates and distortions achievable by coding a unit variance Gaussian source into two equal-rate 
descriptions with a total rate of R„id nats per channel sample, (i.e., each description requires i?md/2 
nats) satisfy [28] 

'^-'^^''^'''^^ = ^ log ^ + ^ log (i_^^)2^'[i'^°2i^^ + ^^)2 ' (26a) 

in the case of low distortions {2Di — Dq < 1) where Dq is the distortion when both descriptions are 
received and Di is the description when only a single description is received. For high distortions with 
(2Di — Dq > 1), there is no penalty for the multiple descriptions and the total rate required is 

RmdiDo,Di) = ^log^. (26b) 

The general rate-distortion region for the MD coding problem is still unknown, in the Gaussian case 
for more than two descriptions, and for more general sources. In the high resolution limit the rate- 
distortion region is the same as for a Gaussian source with variance exp[2/i(s)]/(27re) [23]. Hence for 
our asymptotic analysis we use the rate distortion function in d26b for both Gaussian and non-Gaussian 
sources with exp[2/i(s)]/(27re) = 1. 

Exponentiating ( I26al i yields 

eMRmd{Do,D,)] = Z)-^/('^) • (1 - Do)-'/'' 

■ {I-2D0 + Dl-l- dI- 2Do + 4Di + ADoDi - 4L»2)-i/{2/3) ^21) 
= D-'/^^"^ . (1 - Dor'/^"^ ■ (4Z)i - iDo + 4DoD, - 4D?)-i/(2/3) (28) 

^^^-1/(2/3). ^4^^ _4^^)-l/(2;3) (29) 



February 1, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. ON INFORM. THEORY 



15 



where the last Une follows since (1 - Dq) ^ 1 and 4(Di -Do + DqDi - Dj) ^ 4{Di - Dq) as Dq ^ 
and Di — > 0. If only Dq 0, then the ^ in ( l29t must be replaced with ^. Any reasonable multiple 
description system has Dq < Di/2 (otherwise the denominator of (I26at could be easily increased while 
decreasing the distortion by setting Di = 2Dq). So since 2Di < 4(Z)i — Dq) < ADi we obtain 

{4DqDi)-'/^^('^ < eMRmdiDQ, Di)] < {2DqD^)-^I^^^) (30) 

where the lower bound holds when Dq ^ ^ and the upper bound also requires Z?i 0. 

III. On-Off Component Channels 

In this section, we examine the performance of source and channel coding diversity for scenarios 
in which each of the component channels is either "on", supporting a given transmission rate, or "off, 
supporting no rate (or an arbitrarily small rate). Much of the literature suggests that source coding diversity 
was developed for, and performs well on, such channel models. Our analysis is based upon channels that 
are parameterized in a manner similar to the continuous channels in Section |W] This parameterization 
allows us to compare source and channel coding diversity over a broad range of operating conditions. In 
addition to confirming that there exist operating conditions for which source coding diversity significantly 
outperforms channel coding diversity, our results illustrate that there also exist operating conditions for 
which the performance difference between source and channel coding diversity is negligible. 

A. Component Channel Model 

For cases in which we are concerned with prolonged, deep fading or shadowing in a mobile radio 
channel, strong first-adjacent interference in a terrestrial broadcast channel, or congestion in a network, 
we can model the channel state as taking on only two possible values. Specifically, we can consider 
on-off channels where the channel mutual information has probability law 

{Infl + SNR) , with probabiUty (1 - e) 
(31) 
, with probabiUty e 

In dSTt . SNR parameterizes the channel quality when the channel is on, and e parameterizes the probability 

that the channel is off. There is no connection between the channels' probability of being off and the 

quality in the on state; that is, neither SNR nor the selected encoding rate R effects e. By contrast, for 

the continuous channels discussed in Section Hvl e will depend directly on both. 

For simplicity of exposition, and ease of comparison with continuous channel scenarios in the sequel, 

the term outage will refer to the inability of a given approach to convey information over the pair 
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of component channels. If both channels are off, then the system experiences outage regardless of the 
communication approach; however, as we will see, different approaches may or may not experience 
outage when one of the channels is on and the other is off. For all of the approaches we discuss, due 
to the nature of the on-off channels, performance can be classified into two regimes. The quality-limited 
regime has average distortion performance varying dramatically with the chaimel quality in the on state, 
because the distortion under no outage dominates the average distortion. In this case, the distortion under 
no outage is limited by the rate communicated, which, in turn, is limited by the chaimel quality. The 
outage-limited regime has average distortion performance that does not vary dramatically with the chaimel 
quaUty in the on state, because the distortion under outage dominates the average distortion. 

B. No Diversity 

Combining a SD source coder with a single component channel with channel encoder and decoder, 
the average distortion, as a function of the source coding rate R, is given by 

(l-e)exp(-2/?)+e , if < i? < ln(l + SNR) 

(32) 

1 , otherwise 
Thus, the minimum average distortion is 

-Dno-div = minE [Dno-div(-R)] 

R 

= (l-e)(l + SNR)-2/3 + e . (33) 



E [DNO-Div(i?)] = < 



We say that this system operates in the quality-limited regime if 



(1 + SNRf^ < ^ , (34) 

in which case, the average distortion behaves essentially as (1 — e)(l + SNR)~^'^. If 

(1 + SNR)2^ > ^ , (35) 

the system operates in the outage-limited regime, in which case the average distortion behaves essentially 
as e. 
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C. Optimal Channel Coding Diversity 

Combining a SD source coder with optimal parallel channel coding over the component channels, the 

average distortion, as a function of the source coding rate R, is given by 

/■ 

(1 - e^) exp{-2pR) + , if < i? < ln(l + SNR/2) 



E [DoPT-CCDIv(-R)] 



(1 - e)2 exp{-2pR) + [1 - (1 - e)^] , if ln(l + SNR/2) <R<2 ln(l + SNR/2) 
1 , otherwise 



(36) 

For parallel channel coding, the two channel codewords are independent, and the system is able to sum the 
mutual informations of the component channels. This leads to the upper bound of R < 21n(l + SNR/2) 
in the second case of (l36t . If we instead utilized repetition coding, so that the two channel codewords 
are identical, the upper bound in the second case would instead be i? < ln(l + SNR). 

In contrast to the case of no diversity, the performance of the optimal channel coding diversity exhibits 
a discontinuity as a function of R. Fig. illustrates that, because of the discrete probability distribution 
on the channel states, a discontinuity arises in the outage probability about the point R = ln(l + SNR/2). 

Clearly, each case in i36l is minimized by utilizing the largest possible rate for that case. Then the 
minimum average distortion becomes 

-DoPT-ccDiv = minE [Dopt-ccdiv(-R)] 

= mill I (1 - e2)(l + SNR/2)-2'^ + , 

(l-e)2(l + SNR/2)-4^ + [l-(l-e)2]} . (37) 

As Fig. [8l illustrates, the two terms in i37l have their own quality- and outage-limited regimes, which, 
when combined by the minimum operation, leads to four trends in the overall system performance. 

Comparing the two terms in (I37t . we see that the different choices of rate lead to different costs 
and benefits. Using the lower transmission rate, R = ln(l + SNR/2), (cf. the first term in i37l ) results 
in better outage-limited performance, but worse quality-limited performance. This approach exploits the 
diversity gain of the underlying parallel channel. On the other hand, using the higher transmission rate, 
R = 21n(l + SNR/2), (cf. the second term in ( l37t ) results in worse outage-limited performance, but 
better quality-limited performance. This approach exploits the multiplexing gain of the underlying parallel 
channel. We note that the diversity and multiplexing terminology is inspired by the inherent tradeoff 
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I{x2\y2) 



Large Multiplexing Gain 




/(^i:yi) 



Fig. 7. Outage region boundaries for optimal parallel channel coding. The ^ symbols correspond to the sample mutual 
information pairs (0,0), (0,ln(l + SNR/2)), (ln(l + SNR/2),0), and (ln(l + SNR/2),ln(l + SNR/2)). The solid line 
corresponds to the first case of <36> . in which a low rate is selected to take advantage of diversity gain. The dashed line 
corresponds to the second case of <36> . in which a higher rate is selected to take advantage of multiplexing gain. Outage regions 
are below and to the left of these diagonals. 
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Fig. 8. Average distortion performance with e = 10 ^ for the first (solid line) and second (dashed line) terms in the minimization 

of OB- 
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between the two for multiple-input, multiple-output (MIMO) wireless systems operating over fading 
channels [48]. 

Note that the two terms in i31l are equal when 

(1 + SNR/2)2/5 = . (38) 

For small SNR (such that (1 + SNR/2)^^ < (1 — e)/ (2e)), we exploit the multiplexing mode of operation 
and pass through its quality-limited and outage-limited regimes as we increase SNR until d38t is satisfied. 
As we will see, passing through the outage-limited regime of the multiplexing mode is the key limitation 
of optimal channel coding diversity for on-off channels. For higher SNR (such that (1 + SNR/2)^^ > 
(1 — e)/ (2e)), we exploit the diversity mode of operation and pass through its quality- and outage-limited 
regimes as we increase SNR. 

D. Source Coding Diversity 

In this section, we approximate the minimum average distortion for an MD system with independent 
channel coding. The analysis of this system is slightly more involved than those of previous sections 
because the rate-distortion region for MD coding is more complex, and independent channel coding over 
on-off component channels involves a pair of outage events. 

Similar to Fig. Fig- HI displays outage region boundaries for independent channel coding. It is 
straightforward to see that the source coder should employ rates no greater than ln(l + SNR/2) on each 
of the component channels; otherwise, one of the channels exhibits outage with probability one, and the 
system can perform no better than the case of no diversity with half the SNR. As a result, our analysis 
only considers the case Ri < ln(l + SNR/2). Moreover, due to the symmetry of the component channels, 
one can expect symmetric rates, i.e., Ri = R2 = R, to be optimal; thus, we focus on this case. With 
these simplifications, we observe that, in contrast to the triangular outage regions for optimal parallel 
channel coding in Fig. the rectangular outage regions for independent channel coding in Fig. |9l are 
well-matched to the on-off channel realizations. 

Optimizing average distortion for the MD system requires a tradeoff between the distortion Di = D2 
achieved when only one description is received and the joint distortion Dq achieved when both descriptions 
are received. Although this tradeoff is available in d^ . we refactor it for our purposes here. Specifically, 
we set 

exp(-(l - A)2/?i?) , < A < 1 



(39) 

1 , A = 1 
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'(x2;y2 




'(xi;yi) 



Fig. 9. Outage region boundaries for MD source coding with independent channel coding. The ^ symbols correspond to 
the sample mutual information pairs (0, 0), (0, ln(l + SNR/2)), (ln(l + SNR/2), 0), and (ln(l + SNR/2), ln(l + SNR/2)). 
The solid line corresponds to the outage region boundary for the first channel, and the dashed line corresponds to the outage 
region boundary for the second channel. The outage region for channel one (resp. channel two) is to the left (resp. below) the 
boundary. 



where R is the channel coding rate for a single channel. Thus, if A = 0, the individual descriptions 
achieve the single description rate-distortion bound. With this parameterization of Di and D2, the MD 
high-resolution approximation d30l l yields 



Da 



iexp(-(l + A)2^fl) , 0<A<1 

(40) 

exp(-4/3fl) , A = 1 



for the joint distortion when both descriptions are received. We note that an essentially identical approx- 
imation is developed in [16]. 

The minimum average distortion for source coding diversity is then approximately 

I^SCDiv ~ min{ mill + 2e(l - e)(l + SNR/2)-(i-^)2/3 ^ _ + SNR/2) "(1+^)2/3^ 

0<CA<Cl A 

[1 - (1 - e)2] + (1 - e)2(l + SNR/2)-^'3| _ ^4^^ 

For A = 1, source coding diversity performance reduces to that of channel coding diversity; for A = 
0, source coding diversity performance reduces to that of no diversity with half the SNR. Because 
optimization over A does not lend much insight, we delay discussion of source coding diversity quality- 
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Fig. 10. Average distortion performance over on-off chiannels. The plots show average distortion as a function of SNR; 
successively lower curves correspond to no diversity (dotted lines), optimal channel coding diversity (dashed lines), and source 
coding diversity (solid lines), respectively. Each plot corresponds to a different value for the probability e of a component channel 
being off, and all are for /3 = 1. 



and outage-limited regimes to the next section, where we also compare with the other approaches. 
E. Comparison 

Fig- El compares average distortion performance of source and channel coding diversity by displaying 
the minimum average distortions (I33t . d37b . and (I41t as functions of the component channel quality, SNR, 
in the on state, for different values of the probability of a component channel being off, e. The results 
in Fig. El are clearly consistent with our intuitive discussion of source and channel coding diversity 
performance in Section ITXl For moderate SNR, depending upon e, both systems exhibit transitions from 
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SNR~^ behavior to SNR^^ behavior; however, the transition is generally less drastic for source coding 
diversity, especially for smaller e. The difference between the two systems is apparently the outage-limited 
behavior of the multiplexing mode for optimal channel coding diversity, for which the outage regions are 
not well-matched to the channel realizations. By contrast, the transition between the two quality-limited 
trends for source coding diversity is much less drastic, and this graceful degradation property of source 
coding diversity leads to their better performance over on-off channels. However, it is important to note 
that there is negligible difference between optimal channel coding diversity and source coding diversity 
at both low and high SNR. 

IV. Continuous State Channels 

In cases where we are concerned with time or frequency selective multipath fading in a mobile radio 
channel or a range of possible interference levels in a cellular network, we can model the channel state 
Sj as taking on a continuum of values. For example, multiplicative fading is commonly modeled as a 
Rayleigh or Nakagami random variable in such scenarios. In the following section we study the average 
mean square distortion in the limit of high SNR for such continuous channels when the channel state is 
known to the receiver but not the transmitter. Since the distortion generally behaves as SNR^^ for such 
channels, we are mainly interested in computing the distortion exponent defined as 

A = - li„, (42) 

SNR^oo log SNR 

Note that there is an important difference between the average or transmit signal-to-noise ratio which 
is deterministic and known by both transmitter and receiver and the instantaneous or block signal-to-noise 
ratio which is random and known only at the receiver. Throughout the rest of the paper, we always use 
SNR to refer to the former and consider the random, instantaneous signal-to-noise ratio as a random 
variable. 

In Section IIV-GI we plot the distortion exponents as well as the numerically computed average 
distortions for a Gaussian source transmitted over a complex Rayleigh fading additive white Gaussian 
noise channel. Hence the reader may find it useful to refer to Figures and as a concrete example 
for comparing the following results for the performance of each system. 

A. Continuous Channel Model 

For continuous state channels, the distribution of the mutual information random variable is generally 
difficult to compute exactly. For complex, additive white Gaussian noise channels with multiplicative 
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fading, however, the mutual information random variable is / = log(l + a ■ SNR) where a corresponds 
to the multiplicative fading which is normalized so that £'[a] = 1 so that SNR is the transmit power or 
equivalently, the average received power. For a ■ SNR » 1, we have 

/ = log(a • SNR) + log (l + ^7^) = log(a • SNR) + O (^7^) « log(a • SNR) 

and so exp / is close to SNR • a. ^ Thus, for additive Gaussian noise channels with multiplicative fading, 
we can develop asymptotic results by considering the first terms in the Taylor series expansion of the 
distribution of a near zero. More generally, we can focus on the high SNR limit by considering the Taylor 
series expansion of the distribution for the mutual information random variable for each channel. 

Specifically, let //(t) and Fi{t) represent the probability density function (PDF) and cumulative 
distribution function (CDF) for the mutual information and let f^i (t) and Fg/ (i) represent the PDF and 
CDF for /.^ We consider the case where there exists a parameter called SNR such that 

t 



fe' (t) ~ CP J (With p>l) (43) 

and consequently F^i (t) can be approximated via 

Intuitively, SNR represents the transmit signal-to-noise ratio or the average signal-to-noise ratio and Ff,i (t) 
is the probability that the instantaneous signal-to-noise ratio is below t. As introduced in Section III-E. II 
the notion of approximation we use is that a(SNR) ^ 6(SNR) if limsNR-^00 a(SNR)/6(SNR) = 1 and 
limsNR^oo !a(SNR) - 6(SNR)| = 0. 

For example, in wireless communications, a common model is an additive white Gaussian noise channel 
with fading: 

y\i] = a-x\i]+z \i] (45) 

where a represents the fading and z [i] represents additive noise. A common approach is to obtain 
robustness by coding over two separate frequency bands or time-slots in which case the channel model 

""A similar expression can also be obtained for additive noise channels with non-Gaussian noise (e.g., using techniques from 
[53], [54]). 

^Recall that we assume the mutual information optimizing input distribution is independent of the channel state. Hence it 
makes sense to speak of the mutual information distribution as given instead of a parameter controlled by the system designer. 
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becomes 

yi [i\ = ai-xi [i\ + zi [i] 
y2[i\ = 32 • M'i] + ^2['i\- 

If we are interested in Rayleigh fading then each has an exponential distribution and at high SNR, the 
cumulative distribution function for exp /(yj; Xj) is approximated by t/SNR and hence the parameters c 
and p in (l44l i are both unity {e.g., see [55], [56] for a discussion of such high SNR expansions). 



B. No Diversity 

Perhaps the simplest case to consider is when there is only a single channel and no diversity is present. 
For such a scenario, a natural approach is cascading an SD source encoder/decoder ENCm^s(-)/DECs^m(') 
with a single channel encoder/decoder ENCx^m(')/DECm^y(-)- I^i terms of our general joint source- 
channel coding notation such a system has the encoder and decoder 

X = ENC,^s(s) = ENCx^„(ENC„^3(s)) (46a) 

DECs^^(DEC^^y(y)), DEC^^y(y) ^ 



DECs^y(y) = <^ 



(46b) 

E[s\, otherwise. 



Theorem 1: The distortion exponent for a system with no diversity described by (I46t is 

Ano-div = , (47) 

2/j + p 

where (3 is the processing gain defined in Section III-D.ll and p is the diversity order of the channel 
approximation in (l44l i. 

Proof: The average distortion is 

E[D] = minPr[/(x;y) < R{D)] + {1 - Pr[/(x;y) < • D (48) 

= minFe,(expi?(Z))) + [1 - Fei{R{D))] ■ D (49) 



£)-p/{2/3) 

mill c h 

D SNRP 



D-p/{2/3) 



• D (50) 



£)-p/{2/3) 



Differentiating and setting equal to yields the minimizing distortion 



cp 
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Substituting this into iSll yields 

E[D] ^ C7no~div • SNR^. (52) 
where Cno-div represents a term independent of SNR. Thus the distortion exponent is 2/3p/(2/3 + p). 



C. Selection Channel Coding Diversity 

Perhaps the simplest approach to using two independent channels is to use SD source coding with 
repetition channel coding and selection combining. In this scheme, the encoder quantizes the source, s, 
to s, adds channel coding to produce x, and repeats the result on both channels. The receiver decodes 
the higher quality channel and ignores the other. Formally, the encoder and decoder are given by 

(xi, X2) = ENCx„x.^s(s) = (ENCx^^(ENC^^s(s)), ENC,^^(ENC„^s(s))) (53a) 

DEQ^^(DEC^^y(yi)), DEC^^y(yi) / 

DEQ^^(DEC^^y(y2)), DEC^^y(yi) = and DEC^^y(y2) / 
E[s], otherwise 

(53b) 

where ENCm^s(-)/I^ECs^m(-) correspond to the SD source encoder/decoder and ENCx^m(-)/I^ECm^y(-) 
correspond to the single channel encoder/decoder. Thus, the quantized source signal will be recovered 
provided either channel is good. While such a scheme is sub-optimal in terms of resource use, it is simplest 
to understand and easiest to implement. The following theorem (proved in Appendix IXJ characterize 
asymptotic performance. 

Theorem 2: The distortion exponent for a system with selection channel coding diversity described by 
(|53l is 



s = DECs^y^„,(yi,y2) 



^SEL-CCDIV — -5— • (j4) 

P + P 

D. Multiplexed Channel Coding Diversity 

A key drawback of repetition coding with selection combining is that it wastes the potential bandwidth 
of one channel in order to provide diversity. When the channel is usually good, such a scheme can be 
significantly sub-optimal. Hence, a complementary approach is channel multiplexing where the source is 
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DECs^y,„,(yi,y2) 



quantized using SD coding and this message is split over both channels. We define a channel multiplexing 
system as one with encoder and decoder given by 

(xi,X2) = ENCx„x.^s(s) = (ENCx,^^,(ENC„,^s(s)),ENC,,^„,(ENC^,^s(s))) (55a) 

DECs^^(DEC^,^y,(yi),DEC^,^y,(y2)), DEC^,^y,(yi) + and 

DEC^,^y,(y2)/0 

£^[s], otherwise. 

(55b) 

where ENCx^^m, (O/DECmi^YiC-) correspond to single channel encoders/decoders and ENCm,^s(') 
correspond to the first and second half of the output of a single description source encoder with decoder 
DECs^m( )- If both channels are good enough to support successful decoding, then this scheme can 
transmit roughly twice the rate of a repetition coding system. The drawback is since either channel being 
bad can cause decoding failure, the system is less robust. The following theorem (proved in Appendix IbI 
characterizes asymptotic performance. 

Theorem 3: The distortion exponent for a system with multiplexed channel coding diversity described 
by d55l is 

Ampx-ccdiv = 4p/3/(p + 4/3). (56) 
Intuitively, we expect that when bandwidth is plentiful and outage is the dominating concern, the 
diversity provided by repetition coding is more important than the extra rate provided by channel 
multiplexing. When bandwidth is scarce, we expect the reverse to be true. We can verify this intuition 
by examining the distortion exponents in these two limits to obtain 

lim f^^^-^^"^^ = 2 (57) 
P/p^oQ Ampx-ccdiv 

AsEL-CCDIV 1 ,co\ 

lim — = -. (58) 

Ampx-ccdiv 2 

The distortion exponents are equal if p = 2(3. 

E. Optimal Channel Coding Diversity 

Each of the previous schemes used SD source coding with some form of independent channel coding 
and hence was sub-optimal. With SD source coding, the optimal strategy is to use parallel channel coding. 
In this scheme, the two component channels are treated as a single parallel channel with channel encoding 
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(xi,X2) =ENC,„,,^s(s 
s = DECs^y^,y,(yi,y2) = < 



(59a) 
(59b) 



and decoding performed jointly over both. Specifically, we define optimal channel coding diversity as 

ENC,^^(ENC^^s(s)) 

DEC|^^(DEC^^y(yi, ya)), DEC^^y(yi, ya) / 
E[s], otherwise 

where ENCm^s(')/I-'ECs^m(') correspond to the SD source encoder/decoder and ENCx^rn{')/^^C m^yi 
correspond to the parallel channel encoder/decoder. Since parallel channel coding optimally uses the 
channel resources, it dominates both repetition coding with selection combining and channel multiplexing 
as characterized by the following theorem (proved in Appendix O. 

Theorem 4: The distortion exponent for a system with optimal channel coding diversity described by 
(El is 



A 



OPT-CCDIV 



Ap(3 



(60) 



v + W 

F. Source Coding Diversity 

Next, we consider the case where the source is transmitted over a pair of independent channels using 
MD source coding. Specifically, we consider a system with 



s = DECs^y^,y,(yi,y2) = < 



(ENCx,^ 


-mi(ENCmi^ 


_s(s)),ENCx2^^2(ENC„,^s 


(s))) 




(61a) 


DECs, 


-mi(DECmi 


-yi(yi))> 


DEC^,. 


-yi(yi) + and 








DEC^,. 


-y2(y2) 


= 


DECs, 


— m2 (DECma 


^y2(y2)), 


DEC^,. 


-yi(yi) 


= and 


< 






DEC^,. 


-y2(y2) 




DECs, 


2^mi,2(DEC 


mi^yi(yi),DEC;^,,^y,(y2)), 


DEC^,. 


-yi(yi) 


/ and 








DEC^,. 


-y2(y2) 










DEC^,. 


-yi(yi) 


= and 








DEC^,. 


-y2(y2) 


= 



(61b) 



where ENC^.^sCO and ENC 



m2^s(') represent the two quantizations of the source produced by the MD 
source coder, DECs;<_mi(') represent the possible source decoders described in Tab. HI and ENCx.^mi(-) 



/ DECm.^yi(-) correspond to single channel encoders/decoders. The performance of such a system is 



characterized by Theorem |5l (proved in Appendix IdI). 
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Theorem 5: The distortion exponent for source coding diversity as described by (1611 1 is 

AscDiv = max , ^ , , . (62) 
When p < 4/3, MD source coding achieves diversity in the sense that if either channel is bad but the 
other is good a coarse-grained description of the source can be reconstructed while if both channels are 
good, a fine-grained description can be reconstructed. Therefore, in this regime, source coding diversity 
dominates sub-optimal channel coding diversity because it takes advantage of the redundancy between 
descriptions at the source coding layer. 

When p > 4/3, however, the max in (l62l selects the second term. In this regime, it is more important 
to maximizes the transmitted rate than protect against fading. Thus source coding diversity degenerates 
into multiplex channel coding diversity as analyzed in Section ITV-DI 

In both regimes, optimal channel coding diversity dominates source coding diversity. 

G. Rayleigh Fading AWGN Example 

In this section, we evaluate the various distortion exponents on a complex Rayleigh fading additive 
white Gaussian noise (AWGN) channel. The high SNR approximation for the mutual information on 
each Rayleigh fading AWGN channel is F^,{t) ^ (t/SNR), i.e., p = 1 in (iU {e.g., see [55], [56] for a 
discussion of such high SNR expansions). 

The resulting distortion exponents are summarized^ in Tab. jn] and plotted in Fig. When the 
processing gain is small (i.e., (3 <C 1), multiplex and optimal channel coding diversity as well as source 
coding diversity all approach a distortion exponent of 4/3, while selection channel coding diversity and 
no diversity both approach distortion exponents of 2/3. Intuitively, this occurs because since bandwidth 
is scarce, a good system should try to maximize the information communicated by sending different 
information on each channel. Multiplex coding does this by sending different information on each channel 
using the same code, optimal channel coding does this by using a different code for each channel, and 
multiple descriptions coding does this by sending different source descriptions on each channel. Since 
neither selection diversity nor no diversity provide any multiplexing gain (in the sense of [48]) both of 
these systems achieve the same sub-optimal distortion exponent. 

When the processing gain is large {i.e., (3 ^ 1), selection and optimal channel coding diversity as 
well as source coding diversity all approach a distortion exponent of 2, while systems with multiplex 

*The distortion exponents in this paper are slightly different than in [49] due to different definitions of the processing gain 
as described in Section ITl-D.ll 
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channel coding diversity or no diversity achieve a smaller distortion exponent of 1 . Intuitively, this occurs 
because, since bandwidth is plentiful, even one good channel provides plenty of rate to send a satisfactory 
description of the source. Thus good systems should try to maximize robustness by being able to decode 
even if one channel fails completely. 

At both extremes of processing gain, the best distortion exponent can be achieved either by exploiting 
diversity at the physical layer via parallel channel coding or at the application layer via multiple description 
coding. In some sense, this suggests that both physical layer and application layer systems are flexible 
enough to incorporate the main principles of diversity for continuous channels. Other sub-optimal schemes 
such as selection channel coding diversity are less flexible in that they only incorporate a subset of the 
important principles of diversity and thus approach the best distortion exponent in at most one extreme of 
processing gain. For all processing gains, however, optimal channel coding diversity is superior to source 
coding diversity, suggesting that the application layer system is missing something. In Section we 
show that the loss of source coding diversity is essentially caused by separating the process of channel 
decoding from source decoding. 



TABLE II 
Distortion exponents. 



System 


A 


No Diversity (Section ITV-Bt 


2/3/(2/3 + 1) 


Selection Qiannel Coding Diversity (Section llV-E> 


2/3/ (/3 + 1) 


Multiplex Channel Coding Diversity (Section llV-Dt 


4/3/(4/3 + 1) 


Optimal Channel Coding Diversity (Section llV-E> 


4/3/(2/3 + 1) 


Source Coding Diversity (Section llV-Ft 


max[8/3/(4/3 + 3), 4/3/(4/3 + 1)] 



Fig. El shows the average distortion for various systems transmitting over complex Rayleigh fading 
AWGN channels with (3 = 1 where the parameters in the rate optimizations have been numerically 
computed for each system using the high SNR approximations. As the plot indicates, the difference in 
performance suggested by the asymptotic results in Tab. |n] becomes evident even at reasonable SNR. 
Indeed, as the figure shows, optimal channel coding diversity is always superior to source diversity and 
achieves an advantage of a few dB at moderate SNR. Source diversity is superior to selection diversity 
by a similar margin. In contrast. Fig. [TUI shows that for on-off channels, source-diversity is always better 
than optimal channel coding diversity for on-off channels. Evidently, none of the systems considered 
so far are universally optimal and the best way to achieve diversity depends on the qualitative features 
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I I 

-10 -5 5 10 

Bandwidth Expansion (dB) 

Fig. 11. Distortion exponents as a function of bandwidth expansion factor (5 in decibels. From top to bottom on the right hand 
side the curves correspond to optimal channel coding diversity (Section llV-Et . source coding diversity (Section llV- Ft . selection 
channel coding diversity (Section llV-C> . multiplexed channel coding diversity (Section llV-Dt . and no diversity (Section llV-Bt . 

of the channel. In the next section, we consider a joint source-channel coding system which we show 
achieves the benefits of source-diversity for on-off channels and the benefits of optimal channel diversity 
for continuous state channels. 

V. Source Coding Diversity with Joint Decoding 

In this section we consider source coding diversity with a joint decoder that uses the redundancy in 
both the source coder and channel coder to decode the received signal. Specifically, we define source 
coding diversity with joint decoding to have encoder and decoder 

(xi,X2) = ENCx„x.^s(s) = (ENCx,^„,(ENC„,^s(s)),ENC,,^„,(ENC^,^s(s))) (63a) 
s = DECs^y,,y,(yi,y2) (63b) 

where ENCxi^mi(')/ENCxi^mi(') are single channel encoders (with potentially but not necessarily 
different codes), ENCmi«-s(')/I^NCm2*-s(') are MD source encoders, and DECi^yj^y^ (•) is a joint source- 
channel decoder to be described in the sequel. 

The motivation for joint source-channel decoding is illustrated by considering the conceptual diagram 
of an MD quantizer in Fig. [O] Since the two quantization indexes ENCmi^s(s) and ENCm2^s(s) are 
correlated, the channel decoder should take this correlation into account. For example, if one channel is 
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Fig. 12. Average distortion performance on a complex Rayleigh fading additive white Gaussian noise channel with processing 
gain P = 1. From top to bottom on the right hand side the curves correspond to no diversity ("Section flV-B> . multiplexed channel 
coding diversity (Section llV-Dt . selection channel coding diversity (Section llV-C> . source coding diversity (Section llV-F> . and 
optimal channel coding diversity (Section llV-H> . 



good and yi is accurately decoded to mi = ENCmi^s(s) this decreases the number of possible values 
for 1712 and makes decoding y2 easier. 

We show that a joint decoder that exploits this correlation can enlarge the region where both mi and 
m2 are successfully decoded. Specifically, with separate decoding, both descriptions are decoded when 
both /(xi;yi) and /(x2;y2) exceed some rate threshold Rt, which is denoted as region III in Fig. |3] A 
joint decoder, however, also recovers both descriptions in region II yielding the decoding regions shown 
in Fig. With these enlarged decoding regions, we show that source coding diversity with joint source- 
channel decoding achieves the same performance as optimal channel coding diversity for continuous 
channels in addition to providing the benefits of source coding diversity for on-off channels. 

A. System Description 

Next we describe one way to implement the architecture in (l63l using an information theoretic 
formulation and random coding arguments. 
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1712 

Fig. 13. Conceptual diagram of an MD quantizer. The source s is mapped to the quantizer bins labeled mi = ENCmi^s(s) 
and 1712 = ENCm2^s(s). Since only overlapping pairs of indexes are legal quantization values, if a receiver accurately decodes 
mi from the channel output yi, then there are only two possible values for m2 in decoding a second channel output y2. 



A 



o 

O 

to 

a 



Outage 



Decode Both 
mi and m2 



Decode mi 



> 

/(^i;yi) 



Fig. 14. Decoding regions for a joint source-channel decoder. 



1) Source Encoding: Choose a test-channel distribution Psi,s2|s(^i) ^2|s) with the marginal distributions 

PiA^i) = X] i'si,s2|s(si,S2|s)ps(s), for i e {1,2}. (64) 

Create a pair of rate R random source codebooks, Ci and C2 by randomly generating cxpjigR sequences 
of length Ug according to the i.i.d. test-channel distributions Psiisi). To encode a source, find a pair of 
codewords Si G Ci, §2 G 62, such that the triple (§1, §2, s) is strongly typical. According to [28], encoding 
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will succeed with probability approaching one if^ 

R>I{si;s) (65a) 
R>I{s2;s) (65b) 
2R > /(s; S1S2) + /(si; ^). (65c) 

2) Channel Encoding: For each channel, generate a rate R random codebook, Ci, by randomly 
selecting exp(nsi?) sequences (or equivalently exp(nci?//9) sequences) of length ric according to the 
i.i.d. distribution Px{x). Encode the source codeword in the ith row of Cj by mapping it to the ith 
channel codeword in Cj. 

3) Joint Decoding: Denote the output of chaimel j as yj for j G {1,2}. To decode, create the lists £1 
and C2, by finding all channel codewords, xj G Cj, such that the pair (xj,yj) is typical with respect to 
the distribution Pyj^xjlajiv^^Wj)- Next search for a unique pair of codewords (xi,X2) with xi G Ci and 
X2 € C2 such that the corresponding source codewords (si, §2) are typical with respect to the distribution 
Psi,s2(^i) S2). If a unique pair is found, output the resulting source reconstructions. Otherwise declare a 
decoding error. 

4) Probability Of Error: The following theorem provides an achievable rate for source coding diversity 
with joint decoding. 

Theorem 6: Joint decoding will succeed with probabihty approaching one if 

max[0,ii;-/?- /(xi;yi)] + max[0,ii:-/?- /(X2;y2)] < /(si;s2). (66) 
Proof: Decoding can fail if either the correct pair of source codewords are not typical or if an 
incorrect pair of source codewords are typical. According to the law of large numbers the probability of 
the former event tends to zero as the block length increases. Therefore, the union bound implies that if 
the probability of the latter tends to zero, then the total probability of a decoding error also tends to zero. 

The probabihty that an incorrect pair of channel codewords is typical according to Py^,xj\aj{y:x\(^j) 
is roughly exp —ncl{xj;yj). Since there are expn^i? possible codewords for each channel, the expected 
Ust sizes are 

\jCj\ = 1 + exp [usR — ncl{xj;yj)] + e (67) 

^Note that [28] also includes a term so which can be ignored (i.e.. So can be set to null or set to a constant such as 0) for 
our purposes. 
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where the "1" corresponds to the correct channel codeword and e denotes a quantity which goes to 0. 
Using standard arguments it is possible to show that the actual list sizes will be close to the expected 
list size with probability approaching one. 

The probability that an incorrect pair of source codewords, (si,S2) corresponding to the channel 
codeword pair (xi,X2) with Xj € Cj is typical is roughly exp — ns/(si; S2). Multiplying this probability 
by the number of incorrect pairs yields the expected number of incorrect codewords which are nonetheless 
typical: 

exp{-ns/(si;S2) + max[0,nsR - nc/(xi;yi)] + max [0,nsR - nc/(x2;y2)]} . (68) 

Therefore, after dividing through by Ug and recalling that the processing gain is defined as /3 = ric/us, 
we conclude that decoding succeeds provided that holds. ■ 

B. Performance 

In order to analyze performance, we must first choose a distribution for the source and channel 
codebooks. Naturally, we choose the capacity optimizing input distribution for each channel codebook 
Cj. For the source codebook distribution we use a simpler form of the additive noise test-channel in [28]: 

Sj = s + rij (69) 

where (A7i,n2) is a pair of zero-mean, variance cr^, Gaussian random variables independent of s and 
each other. For this distribution, the distortion when using only description j is Dj < a"^. When both 
descriptions are received they can be averaged to yield distortion Di^2 < o"^/2. 

1 ) Performance on Continuous Channels: To derive the performance on continuous channels, we must 
choose cr^ as a function of the channel parameters. The choice of cj^ determines the rate and hence also 
the probability of outage and the distortion exponent. Our goal is to show that source coding diversity 
with joint decoding achieves the same distortion exponent as optimal channel coding diversity. Hence 
instead of solving an optimization problem to determine cr^, we make an educated guess inspired by 
(llOU to choose*^ 

= SNR^- (70) 

^Technically, it would be better to choose to be proportional to the right hand side of <70> with a complicated proportionality 
constant. Since distortion exponent analysis essentially ignores constant factors, however, we ignore this refinement to simplify 
the exposition. 
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Theorem 7: The distortion exponent for source coding diversity with joint decoding is at least as good 
as that for optimal channel coding diversity: 

^SCDIV-JD > AoPT-CCDIV- (71) 

Note that to achieve the distortion exponent in the previous theorem, the multiple description source 
redundancy is used in two quaUtatively different ways. First, the redundancy between xi and X2 is used 
to recover the two source descriptions. In this sense, the source coding redundancy acts like channel 
coding redundancy in providing robustness to noise. 

Next, the redundancy between Si and §2 is used to produce a better source reconstruction by combining 
the two descriptions. For example, [7] describes a system where the quantization noise for each description 
is independent of the source and so by averaging the two descriptions, the quantization noise power can be 
reduced by half. Regardless of how the two descriptions are combined into a higher resolution description, 
however, the key benefit of joint source-channel decoding is that it can gain the maximum benefit of the 
redundancy required by multiple description coding both at the channel decoding stage and the source 
decoding stage. 

VI. Concluding Remarks 

We considered various architectures to minimize the average distortion in transmitting a source over 
independent parallel channels. Conceptually, we view the overall channel quality encountered by a system 
as a two-dimensional random variable where the two axes correspond to the Shannon mutual information 
for each channel. As illustrated in Fig. |3j the different architectures considered essentially correspond to 
systems which perform well when the channel quality is in a certain part of this two-dimensional mutual 
information plane. Thus minimizing the distortion for a given channel model corresponds to choosing an 
architecture matched to the shape of the overall channel mutual information distribution. 

For on-off channel models, where a channel either fails completely or functions normally, the overall 
channel mutual information takes values on the Cartesian product of a finite set. This shape is well 
matched to source coding diversity, i.e., MD source coding and independent channel coding, that exploits 
diversity at the application layer Specifically, in the high SNR regime, it is essential that both channels 
carry redundant information so that if one channel fails the signal can still be decoded from the surviving 
channel. This forces channel coding diversity to use complete redundancy, and so the distortion when 
both channels are on is the same as when only one channel is on. In contrast, source coding diversity can 
use only partial redundancy by sending slightly different signals on each channel. When both channels 
are on, the differences in the two received descriptions lead to a higher resolution reconstruction and 
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lower distortion. Therefore, source coding diversity achieves substantially better performance than channel 
coding diversity as illustrated in Fig. \W\ 

In contrast, for fading, shadowing, and similar effects, the overall channel mutual information takes on a 
continuous range of values. This shape is better suited to optimal channel coding diversity that exploits at 
the physical layer. Specifically, in the high SNR regime, optimal channel coding diversity takes advantage 
of redundancy between the information transmitted across each channel while source coding diversity 
with separate decoding cannot. As one of our main results, we showed that for such channels the average 
distortion asymptotically behaves as SNR^^. In particular, we calculated the distortion exponent A for 
various architectures and showed that the distortion exponent for optimal channel coding diversity is 
strictly better than for source coding diversity. 

Finally, we demonstrated that there is no inherent flaw in source coding diversity on continuous 
channels. Instead, the inferior distortion exponent of source coding diversity is due to the sub-optimality 
of separate source and channel decoding. If joint source-channel decoding is allowed, source coding 
diversity achieves the same distortion exponent as optimal channel coding diversity. Thus, for the non- 
ergodic channels considered in this paper. Shannon's source-channel separation theorem fails,^ and the 
best overall performance is achieved by a joint source-channel architecture using multiple description 
coding. 

While this paper explores a variety of architectures, many aspects of the detailed design, analysis and 
implementation of such systems remain to be addressed. On the information theory side, determining 
the best possible average distortion, or at least lower bounds to the best distortion, would be a valuable 
step. Similarly, determining the performance for architectures using broadcast channel codes combined 
with successive refinement source codes, hybrid digital-analog codes, or other joint source-channel archi- 
tectures would be interesting. Also, determining second-order performance metrics beyond the distortion 
exponent would be useful in designing practical systems. Some issues of interest in signal processing and 
communication theory include developing practical codes achieving the theoretical advantages of joint 
source-channel decoding, generalizing the results in this paper to sources with memory or correlated 
channels (e.g., as found in multiple antenna systems), and studying the effect of imperfect channel state 

'We believe that the main value of Shannon's original source-channel separation theorem was in showing that bits are 
a sufficient currency between source and channel coding systems. Thus even though the system in Section |V] has separate 
encoding and only the decoding is performed jointly, we say that the separation theorem breaks down because exchanging 
bits is no longer sufficient. Specifically, such a joint decoding system would need to pass lists, log-likelihood ratios, or similar 
information from the channel coding layer to the source coding layer. 
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information at the receiver. Finally, a wide array of similar questions arise in a variety of network problems 
such as relay channels, multi-hop channels, and interference channels. For network scenarios, both the 
number of possible architectures as well as the advantages of sophisticated systems will be larger. 
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Appendix 

A. Distortion Exponent For Selection Channel Coding Diversity 

Proof of Theorem^ The minimum expected distortion for such a scheme is computed as follows: 

^[D] = minPr{max[/(xi;yi),/(x2;y2)] < R{D)} 

+ Pr{max[/(xl;yi),/(x2;y2)] > R{D)} ■ D (72) 
= mmF^i{expR{D)f + [1 - F^i (exp R{D)f] ■ D (73) 
c^D^SNR-^P c^Dl^SNR-^p) ■ D (74) 



mm I 

D 

mill c^L>^SNR-2p + D. (75) 

D 



Differentiating and setting equal to zero yields 



and thus 



B \ p+f 

— (76) 

pcj 



-2p/3 

E[D] ^ CsEL-ccDivSNR— (77) 



where Csel-ccdiv is a constant independent of SNR. 
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B. Distortion Exponent For Multiplexed Channel Coding Diversity 

Proof of Theorem^ The minimum expected distortion for such a scheme is computed as follows: 

£;[D] = minPr {min [/(xi;yi), /(X2;y2)] < 

+ Vi{mm[l{xl-yi),l{x2;y2)]>R{D)}-D (78) 
= min2Fe,(exp[i?(D)/2]) - F^, (exp[i?(Z))/2])2 + [1 - F^, (exp[/?(D)/2])]2 • D (79) 

« min2cL>^SNR"P - c^D^SNR^^^ + ~ cD^SNR^p)^ • D (80) 
^ min2cL'^SNR"^' + D (81) 

D 

Differentiating and setting equal to zero yields the optimizing distortion 

pc) 



and thus 



E[D] ^ Cmpx-ccdivSNR^ (83) 



where Cmpx-CCDIV is a constant independent of SNR. ■ 

C. Distortion Exponent for Optimal Channel Coding Diversity 

Before proving Theorem 0] we require the following lemma characterizing the mutual information for 
the parallel channel in terms of probability distribution for each sub-channel. 
Lemma 1: Let 

/(x;y) = /(xi;yi) + /(x2;y2) 

be the mutual information for the total channel and assume that the density and distribution for each 
sub-channel is given by. (l43l and (l44l i If we define the cumulative distribution function for exp / (x; y) 

as Fg/o+'i (t) then 



p / 1 
SNRV V P 



Fe'o..{t)^p^[^—^] [\nt--] (84) 



in the sense that the ratio of these quantities goes to 1 as SNR oo. 

Proof: Note that for any random variable, a with density fa{t), we have 



/e.(t) = /a(lnt)/t and f,{t) = f,.{e') ■ e*. (85) 
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Therefore we can obtain the desired result by computing the pdf, via convolution, and applying 

+ (t)= f f,{T)- f,{t-T)dT (86) 

(87) 



Jo 



'^SNIF ■ SNIF ^^^^ 
ppt ft 

= c^pH ^ (90) 

Feio+kit)= [ /e'o+'i(T)dT (91) 
J — oo 

" il^^±h^dr (92) 

T 



SNR2p V P p2 p2 

,SNR2y V P 



^pc'It;^^) (Int--) (95) 



where dSSb follows from the high SNR approximation in (l43b . ( l93l follows from substituting ( I90t into 
(1^^ and noting that since /(x;y) is positive then //^^^/^(Int) is non-zero only for t > 1, and the final 
line follows from noting that the last parenthesized term in i94i is negligible at high SNR. ■ 
Proof of Theorem ^ To compute the minimum average distortion we have 

S[D] =rninPr[/(xi;yi) + /(x2;y2) <i?p)] + {l-Pr[/(xi;yi) + /(x2;y2) <i?(D)]}-D (96) 

= min Fe-o+'i (exp R[D)) + [1 - F^,o+,, (R{D))] ■ D (97) 



-P / P \ 

5 D~ / 1 Z)^\ 
2 ' ./31nZ) - - + + 



mmpc ^ 



p p 



l-pc^ 5- -/31nD h 



SNR2p 



(98) 



— P 



minpc^ ^\-f3\nD-- + \+D. (99) 
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By noting that the parenthesized term in ( l99t is between 1/p and (1 + e~^)/p when D < exp —{l/p(3), 
we obtain 

mill +D< E\D] < min (1 + e~^)c^ ^''1 + (100) 

Differentiating the lower bound and setting equal to zero yields the optimizing distortion 



D* = SNR^W . l^-^j . (101) 
Substituting dToTT i into (fTOOl yields 

Clb • SNR^ < E[D] < CvB ■ SNR^ (102) 
where Clb and Cub are terms independent of SNR. Hence we conclude that the distortion exponent is 

AoPT-ccDiv = (4p/3)/(p + 2/?). (103) 



D. Distortion Exponent for Source Coding Diversity 

Proof of Theorem^ For small Dq and Di, the average distortion is 

E[D] = min Pr[/(xi;x2) < i?md(A, ^i)/2]' 

+ 2Pr[/(xi;x2) < i?^d(^o, ^i)/2] • Pr[/(xi; X2) > iJ^dPo, ^i)/2] • 

+ Pr[/(xi;x2) > Rr^A{D^,Di)/2f -D^ (104) 
= min Fe,(expii^d(i?o,i^i)/2)' + 2 • Fe- (exp iJ^dPo, ^i)/2) • [1 - F^, (exp iJ^dPo, ^i)/2)] • 
+ [1 - Fe- (exp R^^{DQ,Di)/2)f -Dq (105) 



g^exp{|.i?^d(Z?o,A)} 



+ 



1 



Di 

' • I^O (106) 



C2 



mm 



— ^ SNR2P """"P ■ I?i)} + 2g^ exp {| ■ i?,nd(A, i^i)} • Di + Dq. (107) 

Substituting the bounds from d30b into (11071) yields 

r / 1 \ 23 2c / 1 \ 'I'J 
F[D1 > min ^ + -Di + Dq (108b) 
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where ilOM requires Dq^O and STMaii also requires Di — > 0. 

When p > 4/? then ilOSl increases as Di becomes small. Hence in this regime the optimal choice for 
Di approaches a constant bounded away from zero. If the low distortion formula for the lower bound 
is used, then the optimal choice for Di approaches one. Technically, however, for Di > 1/2 the rate 
required is given by (I26bt not (I26at . so there is no excess rate in multiple description coding [16], [28] 
and the optimal Di for p > 4/3 approaches 1/2 using (I26bl i. In any case, regardless of whether Di = 1/2 
or Di = 1 or some other intermediate value, when p > 4/3, average distortion is minimized by choosing 
Di to be large. Thus for p > 4/3, the optimal multiple description system essentially degenerates into 
the channel multiplexing scheme analyzed in Section IIV-DI and achieves the same distortion exponent 
(although with a slightly different constant factor term). 

When p < 4:f3, we can find the optimal value for Di by differentiating the lower bound with respect 
to Di and setting equal to to obtain 



-4/3 



4:(3—p\^(<-P , , , , -^I^P ,t-r^ II '^^ 



Dl=[— — -SNR^^^T? -(41)0)'^" ,j^< 4/3. (109) 
\ cp J 

For the case when p < 4/3, substituting (I109t into dlOSbt yields 

E[D] > C ■ D^"^' ■ SNRW? + Dq for p < 4/3 (1 10) 

where C is a constant independent of SNR and Dq. Differentiating with respect to Dq and setting the 
result equal to zero yields the optimal value for Dq. 

-Slip 

C'-SNRw;, p<4/3 

-4/3p 

C"-SNRW?, p>4/3 

from which we conclude 



(111) 



-max -Mp_ ^3p_ 8gp 4/3p 

Clb-SNR i^^+^^'^^+'i <E[D]< Cub -SNR "^''^l W3p'4,3+pj (^2) 

where the max occurs since multiple description coding essentially degenerates into channel multiplexing 
with a better constant factor when p > 4/3. ■ 

E. Distortion Exponent for Source Coding Diversity with Joint Decoding 

Computing the exact rates required to guarantee successful encoding in (l65l is generally difficult, thus 
we focus on the high resolution limit in the following Lemma. 

Lemma 2: Let s be a source with finite variance and finite entropy power. Then in the high resolution 
limit, choosing 

R> h{s) - {I /2)\og2TTea^ (113) 
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asymptotically satisfies i65l and guarantees successful encoding. 
Proof: Proving the claim requires showing that 



lim I(s\Si) 



h{s) — ^ log27recr^ 



for j G {0, 1} 



and 



lim /(s; S1S2) + /(si; S2) - 2 



1 



h{s) — - log 27recr^ 



0. 



(114) 



(115) 



The former follows from the fact that the Shannon Lower Bound is asymptotically tight [50]. In the 
interest of completeness, however, we define AR as left hand side of (^21 and summarize the argument 
showing that it goes to zero: 



AR= lim I(s:si) 



h{s) — - log27recr^ 



lim h(s + Pi) — h(s + nAs) 

Di-*o 



h{s) - - logl-Kea^ 



lim h(s + nA — hirii) 



h{s) — - log27recr^ 



lim h(s + rij] 
0. 



h{s) 



(116) 
(117) 
(118) 
(119) 
(120) 



Equations il 111 and illSl follow from the choice of the conditional distribution Sj = s + rij where rij is 
independent of s. The key step in going from (II 191 to (I120I I is the "continuity" property of differential 
entropy [50, Theorem 1] which is the main tool in obtaining many high-resolution source coding results. 

A similar chain of equalities establishes (II 151 . Specifically, if we define the right hand side of (II 15l l 
as A2R then we obtain 



A2R = Jim^-^(s; S1S2) + /(si; S2) - 2 



h{s) — ^ log27re(T^ 



lim h(siS2) — h(siS2\s) + h(si) — h(si\s2) — 2 



h{s) — - log lirea"^ 



lim hisi) + his2) — h{si\s) — his2\s) — 2 
lim 2-AR 



h{s) — - log 27recj^ 



(121) 
(122) 
(123) 
(124) 
(125) 



= 

where (I124l l follows by noting that (I123l l is simply twice dl 16t . and hence (I125I I follows from (I120l l. ■ 
In the sequel, we require the following Lemma which states that, in the high resolution limit, the two 
descriptions, Si and §2, only differ in half a bit per sample. This close relationship between the two 
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descriptions enables the joint decoder to approach the performance of parallel channel coding with a 
single description. 

Lemma 3: If the rate is chosen according to ( I113t . specifically, if the difference between the two sides 
is e, then 

Jim^ /(si ; S2) - i? > - ^ log 2 - e. (126) 
Proof: We have the following chain of inequalities: 

lim I(si:s2) — R= lim h(s + ni) — h(s + ni\s + no) — R (127) 

= lim h(s + ni) - h(ni - n2\s + n2) - R (128) 

> lim h(s + ni) - h(ni - 02) - i? (129) 

= lim^h{s + ni) -^logAnea^ - R (130) 

= lim^h{s + ni) - h{ni) - ^\og2 - R (131) 

= lim /(si;s) -i?- -log2 (132) 

D,^0 2 



lim /(si:s) 



h{s) - ^ log 27re + e 



-log 2 (133) 



= lim Ai?- -log2-e (134) 
= _llog2-e. (135) 

Most of the arguments follow from well-known properties of mutual information and entropy. Equation 
(I135t follows from Lemma |2l 



Proof of Theorem^ If we choose cr^ as in (I70t . the expected distortion is at most the distortion when 
both descriptions are successfully decoded times the probability that both descriptions are not decoded. 
Hence, applying Theorem |6l yields 

2 

E[D] < y • Pr[£:] + Fv[S'] (136) 

where £ denotes the event that both descriptions can be decoded as defined in d66l and S'^ is the 
complement of £. Note that since Pr[£^] < 1, the first term on the right hand side of (I136t is proportional 
to SNR^^°^^"°'=°'^ by construction due to our choice of in S70[ . Therefore, to prove the Theorem, 
we need to bound the second term, Pr[<S'^]. 
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If we let (with i,j E {1, 2}) denote the event that the first max operation in £ returns the ith 

argument while the second max operation in £ returns the jth argument, then we can express the second 
term in (I136t as 



Pr[^^] = Pr[^^ n £[1, l]\£[l, 1]] Pr[f [1, 1]] + Pr[^^ n £[l, 2]\£[1, 2]] Pr[^[l, 2]] 

+ Pr[^^ n £[2, 1]\£[2, 1]] Pv[£[2, 1]] + Pr[^^ n £[2, 2]\£[2, 2]] Pr[£:[2, 2]]. (137) 
To prove the theorem, it is sufficient to show that for every e > 0, there exists a constant Cij such that 

Pi[£''\£[i,j]]Fr[£[i,j]] < aj ■ SNR''^"^^-^^^'^ 

for large enough SNR. 

Conditioned on £'[1, 1], both /(xi;yi) > R/f3 and /(x2;y2) > R//3, so both channels are good enough 
to decode each description separately. Thus Pr[£''^[<?[l, 1]] = 0, and therefore Pr[£^^|£'[l, 1]] Pr[f [1, 1]] = 
as well. This takes care of the first term in il37l . 

Next we consider the second term of il37l . Conditioned on £"[1,2], only /(xi;yi) > R/(3 while 
I{^2',y2) < R/P and only description 1 can be decoded separately. Description 2 can be decoded jointly 
provided that /(x2;y2) > R/P — /(si;s2)//3. By applying Lemma|5] this condition becomes /(x2;y2) > 
(log2)/(2/3) in the high-resolution limit, therefore 

log 2" 



F4£^\£[1,2]]Ft[£[1,2]] Pr 



I{x2;y2) < 



2/3 



.Pr[£:[l,2]] 



22/3 \ 



SNR 



< c 



22/3 \ 



SNR 



2 2/5 \ 



Pr[f[l,2]] 



Pr[/(x2;y2) <ii//?] 



h{s) \ P 

exp^ 



SNR 



ai//3SNR 



SNR-2P . SNR^ ■ (^2^ exp ^ 

-4p/i „ / 1 h(s) \ P 
SNR~ . 2i^ exp ' 

V P 



(138) 
(139) 

(140) 

(141) 

(142) 
(143) 



where in going from (I140t to (11411) we replaced R with h{s) — (1/2) log 27rec7^ and recalled that we 
assumed exp[2/i(s)] = 27re just after (l23l . 

Thus, for some constant Cscdiv-JD> and every e > 0, there exists an SNR large enough such that 



Pr[£^|f[l,2]]Pr[£[l,2]] < SNR^'— ■ Cscdiv-JD 



(144) 
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and 



Pr[f [2, 1]] Pr[^[2, 1]] < SNR^~^ • Cscdiv-JD- (145) 



A similar analysis works for the third term of ( I137t . 

Finally, we consider the last term in (I137I) . Conditioned on £[2, 2], both /(xi;yi) < R/ (i and /(x2;y2) < 
R/ (3, so neither channels is good enough for separate decoding. Successful joint decoding requires 

/(xi;yi) + /(X2;y2) > [2R - /(si;s2)] //?. (146) 

and therefore 

Pr[rnf[2,2]|^[2,2]] =Pr[/(xi;yi) + /(x2;y2) <2i?//3-/(si;s2)//3] (147) 

log2" 



< Pi 



/(xi;yi) + /(x2;y2) < W 



2/3 



(148) 



^ /2^^exp^\ //^_logW log2_l\ 

\^ai//3SNR2 j V2/9 W W p) ^ 

« • C^cDiv-JD • SNR^ (151) 

= SNR^-?W . C^cDiv-JD (152) 
where ( I148t follows since Lemma |3] implies 

2i?-/(si;s2) <i?-^log2 + e, (153) 

e is a quantity which can be made arbitrarily small, and Cscdiv-JD some constant independent of 
SNR. 

The above results combined with Aqpt-ccdiv = 4p/3/(p + 2/3) proves the desired result. ■ 

References 

[1] W. H. R. Equitz and T. M. Cover, "Successive refinement of information," IEEE Trans. Inform. Theory, vol. 37, pp. 269-275, 
March 1991. 

[2] S. S. Pradhan, R. Puri, and K. Ramchandran, "n-Channel Symmetric Multiple Descriptions, Part II: An Achievable Rate- 
Distortion Region," IEEE Trans. Inform. Theory, Mar. 2003. Submitted for publication. 

[3] R. Venkataramani, G. Kramer, and V. Goyal, "Multiple Description Coding with Many Channels," 
IEEE Trans. Inform. Theory, vol. 49, Sept. 2003. Accepted for publication. Available online at: 
[http : / / cm . bell- labs . com/ cin/ms/who/gkr/Papers/mdIT03 . pdf| 



February 1, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. ON INFORM. THEORY 



46 



[4] J. G. Apostolopoulos, W.-T. Tan, S. J. Wee, and G. W. Wornell, "Modeling Path Diversity for Multiple Description Video 

Communications." Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), May 2002. 
[5] J. Barros, J. Hagenauer, and N. Gortz, "Turbo Cross Decoding of Multiple Descriptions," in Proc. IEEE Int. Conf. 

Communications (ICC), vol. 3, (New York, NY), pp. 1398-1402, 20 April - 2 May 2002. 
[6] H. Coward, R. Knopp, and S. Servetto, "On the Performance of a Natural Class of Joint Source/Channel Codes 
based upon Multiple Descriptions," IEEE Trans. Inform. Theory, 2002. Submitted for publication. Available at 
|http : //people ■ ece . Corne ll . edu/servetto/ publications/papers/2 0010820/| 
[7] Y. Frank-Dayan and R. Zamir, "Dithered lattice-based quantizers for multiple descriptions," IEEE Trans. Inform. Theory, 

vol. 48, pp. 192-204, Jan. 2002. 
[8] A. R. Reibman, H. Jafarkhani, Y. Wang, M. T. Orchard, and R. Puri, "Multiple-description video coding using motion- 
compensated temporal prediction," IEEE Trans. Circuits Syst. Video Technol., vol. 12, pp. 193-204, Mar. 2002. 
[9] X. Tang and A. Zakhor, "Matching pursuits multiple description coding for wireless video," IEEE Trans. Circuits Syst. 
Video Technol, vol. 12, pp. 566-575, June 2002. 
[10] Y. Wang and S. Lin, "Error-resilient video coding using multiple description motion compensation," IEEE Trans. Circuits 

Syst. Video Technol., vol. 12, pp. 438^52, June 2002. 
[11] M. Alasti, K. Sayrafian-Pour, A. Ephremides, and N. Farvardin, "Multiple Description Coding in Networks with Congestion 

Problem," IEEE Trans. Inform. Theory, vol. 47, pp. 891-902, Mar. 2001. 
[12] J. G. Apostolopoulos and S. J. Wee, "Unbalanced Multiple Description Video Communication using Path Diversity," in 

Proc. IEEE Int. Conf. Image Processing (ICIP), vol. 1, (Thessaloniki, Greece), pp. 966-969, Oct. 2001. 
[13] J. G. Apostolopoulos, "Reliable Video Compression over Lossy Packet Networks using Multiple State Encoding and Path 
Diversity," in Proc. SPIE Visual Communications and Image Processing (VCIP), vol. 4310, (San Jose, CA), January 2001. 
[14] N. At and Y. Altunbasak, "Multiple Description Coding for Wireless Channels with Multiple Antennas," in Proc. IEEE 

Global Comm. Conf (GLOBECOM), vol. 3, (San Antonio, TX), pp. 2040-2044, Nov. 2001. 
[15] H. Coward, R. Knopp, and S. D. Servetto, "On the Performance of Multiple Description Codes over Bit Error Channels," 

in Proc. IEEE Int. Symp. Information Theory (ISIT), (Washington, DC), July 2001. 
[16] V. K. Goyal and J. Kovacevic, "Generalized Multiple Description Coding with Correlating Transforms," IEEE Trans. 

Inform. Theory, vol. 47, pp. 2199-2224, Sept. 2001. 
[17] V. K. Goyal, "Multiple Description Coding: Compression Meets the Network," IEEE Signal Proc. Mag., vol. 18, pp. 74—93, 
Sept. 2001. 

[18] N. Kamaci, Y. Altunbasak, and R. M. Mersereau, "Multiple Description Coding with Multiple Transmit and Receive 
Antennas for Wireless Channels: The Case of Digital Modulation," in Proc. IEEE Global Comm. Conf. (GLOBECOM), 
vol. 6, (San Antonio, TX), pp. 3272-3276, Nov. 2001. 

[19] C.-S. Kim and S.-U. Lee, "Multiple description coding of motion fields for robust video transmission," IEEE Trans. Circuits 
Syst Video Technol, vol. 11, pp. 999-1010, Sept. 2001. 

[20] S. S. Pradhan, R. Puri, and K. Ramchandran, "n-Channel Symmetric Multiple Descriptions, Part I: (n,k) Source- 
Channel Erasure Codes," IEEE Trans. Inform. Theory, June 2001. Submitted for publication. Available online at: 
(http : / /www . eecs . umich . edu/ ~praclhanv/paper/Tttrans03_4 ■ ps| 

[21] R. Venkataramani, G. Kramer, and V. K. Goyal, "Bounds on the Achiveable Rate Region for Certain Multiple Description 
Coding Problems," in Proc. IEEE Int. Symp. Information Theory (ISIT), (Washington, DC), p. 148, June 24 - July 29 2001. 



February 1, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. ON INFORM. THEORY 



47 



[22] Y. Wang, M. T. Orchard, V. Vaishampayan, and A. R. Reibman, "Multiple description coding using pairwise correlating 

transforms," IEEE Trans. Image Processing, vol. 10, pp. 351-366, Mar. 2001. 
[23] R. Zamir, "Gaussian Codes and Shannon Bounds for Multiple Descriptions," IEEE Trans. Inform. Theory, vol. 45, pp. 2629- 

2635, Nov. 1999. 

[24] Z. Zhang and T. Berger, "Multiple Description Source Coding with No Excess Marginal Rate," IEEE Trans. Inform. Theory, 

vol. 41, pp. 349-357, Mar. 1995. 
[25] V. A. Vaishampayan, "Design of Multiple Description Scalar Quantizers," IEEE Trans. Inform. Theory, vol. 39, pp. 821-834, 

May 1993. 

[26] Z. Zhang and T. Berger, "New Results in Binary Multiple Descriptions," IEEE Trans. Inform. Theory, vol. 33, pp. 502-521, 
July 1987. 

[27] R. F. Ahlswede, "The Rate-Distortion Region for Multiple Descriptions without Excess Rate," IEEE Trans. Inform. Theory, 

vol. 31, pp. 721-726, Nov. 1985. 
[28] A. A. El Gamal and T. M. Cover, "Achievable Rates for Multiple Descriptions," IEEE Trans. Inform. Theory, vol. 28, 

pp. 851-857, Nov. 1982. 

[29] H. Witsenhausen and A. D. Wyner, "Source Coding for Multiple Descriptions II: A Binary Source." Bell Labs Tech. Rept. 
TM-80-1217, Dec. 1980. 

[30] Q. Zhao and M. Effros, "Lossless and Near-Lossless Source Coding for Multiple Access Networks," IEEE Trans. Inform. 

Theory, vol. 49, pp. 112-128, Jan. 2003. 
[31] G. Caire and S. Shamai (Shitz), "On Achievable Throughput of a Multi-Antenna Gaussian Broadcast Channel," IEEE 

Trans. Inform. Theory, vol. 49, pp. 1691-1706, July 2003. 
[32] L. Li and A. J. Goldsmith, "Capacity and Optimal Resource Allocation for Fading Broadcast Channels - Part I: Ergodic 

Capacity," IEEE Trans. Inform. Theory, vol. 47, pp. 1083-1102, Mar. 2001. 
[33] L. Li and A. J. Goldsmith, "Capacity and Optimal Resource Allocation for Fading Broadcast Channels - Part II: Outage 

Capacity," IEEE Trans. Inform. Theory, vol. 47, pp. 1103-1127, Mar. 2001. 
[34] S. Shamai (Shitz), "A Broadcast Strategy for the Gaussian Slowly Fading Channel," in Proc. IEEE Int. Symp. Information 

Theory (ISIT), (Uhn, Germany), p. 150, June 29 - July 4 1997. 
[35] T. M. Cover, "Comments on broadcast channels," IEEE Trans. Inform. Theory, vol. 44, pp. 2524-2530, Oct. 1998. 
[36] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: John Wiley & Sons, Inc., 1991. 
[37] S.-Y. Chung, On the Construction of Some Capacity Approaching Coding Schemes. PhD thesis, Massachusetts Institute of 

Technology, 2000. 

[38] B. Chen and G. W. Wornell, "Efficient Channel Coding for Analog Sources using Chaotic Systems," in Proc. IEEE Global 

Comm. Conf (GLOBECOM), vol. 1, (London, UK), pp. 131-135, Nov. 1996. 
[39] B. Chen and G. W. Wornell, "Analog Error-Correcting Codes Based on Chaotic Dynamical Systems," IEEE Trans. Commun., 

vol. 46, pp. 881-890, July 1998. 
[40] Z. Reznic, R. Zamir, and M. Feder, "Joint Source-Channel Coding of a Gaussian Mixture Source over the Gaussian 

Broadcast Channel," IEEE Trans. Inform. Theory, vol. 48, pp. 776-781, Mar 2002. 
[41] U. Mittal and N. Phamdo, "Hybrid Digital-Analog (HDA) Joint Source-Channel Codes for Broadcasting and Robust 

Communications," IEEE Trans. Inform. Theory, vol. 48, pp. 1082-1102, May 2002. 
[42] H. Witsenhausen, "On Source Networks with Minimal Breakdown Degradation," Bell Syst. Tech. J., vol. 59, pp. 1083-1087, 

July-Aug. 1980. 



February 1, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. ON INFORM. THEORY 



48 



[43] J. Wolf, A. Wyner, and J. Ziv, "Source Coding for Multiple Descriptions," Bell Syst. Tech. J., vol. 59, pp. 1417-1426, Oct. 
1980. 

[44] L. Ozarow, "On a Source Coding Problem with Two Channels and Three Receivers," Bell Syst. Tech. J., vol. 59, pp. 1909- 
1921, Dec. 1980. 

[45] T. Berger and Z. Zhang, "Minimum Breakdown Degradation in Binary Source Encoding," IEEE Trans. Inform. Theory, 
vol. 29, pp. 807-814, Nov. 1983. 

[46] B. Rimoldi, "Successive Refinement of Information: Characterization of the Achievable Rates," IEEE Trans. Inform. Theory, 
vol. 40, pp. 253-259, Jan. 1994. 

[47] L. H. Ozarow, S. Shamai (Shitz), and A. D. Wyner, "Information Theoretic Considerations for Cellular Mobile Radio," 
IEEE Trans. Veh. Technol, vol. 43, pp. 359-378, May 1994. 

[48] L. Zheng and D. N. C. Tse, "Diversity and Multiplexing: A Fundamental Tradeoff in Multiple-Antenna Channels," IEEE 
Trans. Inform. Theory, vol. 49, pp. 1073-1096, May 2003. 

[49] J. N. Laneman, E. Martinian, G. W. Womell, J. G. Apostolopoulos, and S. J. Wee, "Comparing Application- and Physical- 
Layer Approaches to Diversity on Wireless Channels," in Proc. IEEE International Communications Conference (ICC), 
May 2003. 

[50] T. Linder and R. Zamir, "On the Asymptotic Tightness of the Shannon Lower Bound," IEEE Trans. Inform. Theory, vol. 40, 
pp. 2026-2031, Nov. 1994. 

[51] A. Lapidoth, "On the role of mismatch in rate distortion theory," IEEE Trans. Inform. Theory, vol. 43, pp. 38-47, Jan. 
1997. 

[52] D. Sakrison, "Worst sources and robust codes for difference distortion measures," IEEE Trans. Inform. Theory, vol. 21, 
pp. 301-309, May 1975. 

[53] R. Xwcin and U. Erez, "A gaussian input is not too bad," IEEE Trans. Inform. Theory, vol. 50, pp. 1362-1367, June 2004. 
[54] E. Martinian, "Waterfilling gains 0(1/SNR) at high SNR." Unpublished notes available from 

|http : / / www .csua.berkeley. edu/ "emin/ research/wfill.pdf 
[55] J. N. Laneman and G. W. Wornell, "Distributed space-time-coded protocols for exploiting cooperative diversity in wireless 

networks," IEEE Trans. Inform. Theory, vol. 49, pp. 2415-2425, Oct. 2003. 
[56] Z. Wang and G. B. Giannakis, "What Determines Average and Outage Performance in Fading Channels?," in Proc. IEEE 

Global Comm. Conf (GLOBECOM), (Taipei, Taiwan), Nov. 2002. 



February 1, 2008 



DRAFT 



